The search functionality is under construction.

Keyword Search Result

[Keyword] deep learning(149hit)

141-149hit(149hit)

  • Deep Nonlinear Metric Learning for Speaker Verification in the I-Vector Space

    Yong FENG  Qingyu XIONG  Weiren SHI  

     
    LETTER-Speech and Hearing

      Pubricized:
    2016/10/04
      Vol:
    E100-D No:1
      Page(s):
    215-219

    Speaker verification is the task of determining whether two utterances represent the same person. After representing the utterances in the i-vector space, the crucial problem is only how to compute the similarity of two i-vectors. Metric learning has provided a viable solution to this problem. Until now, many metric learning algorithms have been proposed, but they are usually limited to learning a linear transformation. In this paper, we propose a nonlinear metric learning method, which learns an explicit mapping from the original space to an optimal subspace using deep Restricted Boltzmann Machine network. The proposed method is evaluated on the NIST SRE 2008 dataset. Since the proposed method has a deep learning architecture, the evaluation results show superior performance than some state-of-the-art methods.

  • Combining Fisher Criterion and Deep Learning for Patterned Fabric Defect Inspection

    Yundong LI  Jiyue ZHANG  Yubing LIN  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2016/08/08
      Vol:
    E99-D No:11
      Page(s):
    2840-2842

    In this letter, we propose a novel discriminative representation for patterned fabric defect inspection when only limited negative samples are available. Fisher criterion is introduced into the loss function of deep learning, which can guide the learning direction of deep networks and make the extracted features more discriminating. A deep neural network constructed from the encoder part of trained autoencoders is utilized to classify each pixel in the images into defective or defectless categories, using as context a patch centered on the pixel. Sequentially the confidence map is processed by median filtering and binary thresholding, and then the defect areas are located. Experimental results demonstrate that our method achieves state-of-the-art performance on the benchmark fabric images.

  • Food Image Recognition Using Covariance of Convolutional Layer Feature Maps

    Atsushi TATSUMA  Masaki AONO  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2016/02/23
      Vol:
    E99-D No:6
      Page(s):
    1711-1715

    Recent studies have obtained superior performance in image recognition tasks by using, as an image representation, the fully connected layer activations of Convolutional Neural Networks (CNN) trained with various kinds of images. However, the CNN representation is not very suitable for fine-grained image recognition tasks involving food image recognition. For improving performance of the CNN representation in food image recognition, we propose a novel image representation that is comprised of the covariances of convolutional layer feature maps. In the experiment on the ETHZ Food-101 dataset, our method achieved 58.65% averaged accuracy, which outperforms the previous methods such as the Bag-of-Visual-Words Histogram, the Improved Fisher Vector, and CNN-SVM.

  • k-Degree Layer-Wise Network for Geo-Distributed Computing between Cloud and IoT

    Yiqiang SHENG  Jinlin WANG  Haojiang DENG  Chaopeng LI  

     
    PAPER

      Vol:
    E99-B No:2
      Page(s):
    307-314

    In this paper, we propose a novel architecture for a deep learning system, named k-degree layer-wise network, to realize efficient geo-distributed computing between Cloud and Internet of Things (IoT). The geo-distributed computing extends Cloud to the geographical verge of the network in the neighbor of IoT. The basic ideas of the proposal include a k-degree constraint and a layer-wise constraint. The k-degree constraint is defined such that the degree of each vertex on the h-th layer is exactly k(h) to extend the existing deep belief networks and control the communication cost. The layer-wise constraint is defined such that the layer-wise degrees are monotonically decreasing in positive direction to gradually reduce the dimension of data. We prove the k-degree layer-wise network is sparse, while a typical deep neural network is dense. In an evaluation on the M-distributed MNIST database, the proposal is superior to a state-of-the-art model in terms of communication cost and learning time with scalability.

  • A Cascade System of Dynamic Binary Neural Networks and Learning of Periodic Orbit

    Jungo MORIYASU  Toshimichi SAITO  

     
    PAPER

      Pubricized:
    2015/06/22
      Vol:
    E98-D No:9
      Page(s):
    1622-1629

    This paper studies a cascade system of dynamic binary neural networks. The system is characterized by signum activation function, ternary connection parameters, and integer threshold parameters. As a fundamental learning problem, we consider storage and stabilization of one desired binary periodic orbit that corresponds to control signals of switching circuits. For the storage, we present a simple method based on the correlation learning. For the stabilization, we present a sparsification method based on the mutation operation in the genetic algorithm. Using the Gray-code-based return map, the storage and stability can be investigated. Performing numerical experiments, effectiveness of the learning method is confirmed.

  • Learning Deep Dictionary for Hyperspectral Image Denoising

    Leigang HUO  Xiangchu FENG  Chunlei HUO  Chunhong PAN  

     
    LETTER-Pattern Recognition

      Pubricized:
    2015/04/20
      Vol:
    E98-D No:7
      Page(s):
    1401-1404

    Using traditional single-layer dictionary learning methods, it is difficult to reveal the complex structures hidden in the hyperspectral images. Motivated by deep learning technique, a deep dictionary learning approach is proposed for hyperspectral image denoising, which consists of hierarchical dictionary learning, feature denoising and fine-tuning. Hierarchical dictionary learning is helpful for uncovering the hidden factors in the spectral dimension, and fine-tuning is beneficial for preserving the spectral structure. Experiments demonstrate the effectiveness of the proposed approach.

  • Voice Conversion Based on Speaker-Dependent Restricted Boltzmann Machines

    Toru NAKASHIKA  Tetsuya TAKIGUCHI  Yasuo ARIKI  

     
    PAPER-Voice Conversion and Speech Enhancement

      Vol:
    E97-D No:6
      Page(s):
    1403-1410

    This paper presents a voice conversion technique using speaker-dependent Restricted Boltzmann Machines (RBM) to build high-order eigen spaces of source/target speakers, where it is easier to convert the source speech to the target speech than in the traditional cepstrum space. We build a deep conversion architecture that concatenates the two speaker-dependent RBMs with neural networks, expecting that they automatically discover abstractions to express the original input features. Under this concept, if we train the RBMs using only the speech of an individual speaker that includes various phonemes while keeping the speaker individuality unchanged, it can be considered that there are fewer phonemes and relatively more speaker individuality in the output features of the hidden layer than original acoustic features. Training the RBMs for a source speaker and a target speaker, we can then connect and convert the speaker individuality abstractions using Neural Networks (NN). The converted abstraction of the source speaker is then back-propagated into the acoustic space (e.g., MFCC) using the RBM of the target speaker. We conducted speaker-voice conversion experiments and confirmed the efficacy of our method with respect to subjective and objective criteria, comparing it with the conventional Gaussian Mixture Model-based method and an ordinary NN.

  • Nonlinear Metric Learning with Deep Independent Subspace Analysis Network for Face Verification

    Xinyuan CAI  Chunheng WANG  Baihua XIAO  Yunxue SHAO  

     
    PAPER-Image Recognition, Computer Vision

      Vol:
    E96-D No:12
      Page(s):
    2830-2838

    Face verification is the task of determining whether two given face images represent the same person or not. It is a very challenging task, as the face images, captured in the uncontrolled environments, may have large variations in illumination, expression, pose, background, etc. The crucial problem is how to compute the similarity of two face images. Metric learning has provided a viable solution to this problem. Until now, many metric learning algorithms have been proposed, but they are usually limited to learning a linear transformation. In this paper, we propose a nonlinear metric learning method, which learns an explicit mapping from the original space to an optimal subspace using deep Independent Subspace Analysis (ISA) network. Compared to the linear or kernel based metric learning methods, the proposed deep ISA network is a deep and local learning architecture, and therefore exhibits more powerful ability to learn the nature of highly variable dataset. We evaluate our method on the Labeled Faces in the Wild dataset, and results show superior performance over some state-of-the-art methods.

  • Hidden Conditional Neural Fields for Continuous Phoneme Speech Recognition Open Access

    Yasuhisa FUJII  Kazumasa YAMAMOTO  Seiichi NAKAGAWA  

     
    PAPER-Speech and Hearing

      Vol:
    E95-D No:8
      Page(s):
    2094-2104

    In this paper, we propose Hidden Conditional Neural Fields (HCNF) for continuous phoneme speech recognition, which are a combination of Hidden Conditional Random Fields (HCRF) and a Multi-Layer Perceptron (MLP), and inherit their merits, namely, the discriminative property for sequences from HCRF and the ability to extract non-linear features from an MLP. HCNF can incorporate many types of features from which non-linear features can be extracted, and is trained by sequential criteria. We first present the formulation of HCNF and then examine three methods to further improve automatic speech recognition using HCNF, which is an objective function that explicitly considers training errors, provides a hierarchical tandem-style feature and includes a deep non-linear feature extractor for the observation function. We show that HCNF can be trained realistically without any initial model and outperforms HCRF and the triphone hidden Markov model trained by the minimum phone error (MPE) manner using experimental results for continuous English phoneme recognition on the TIMIT core test set and Japanese phoneme recognition on the IPA 100 test set.

141-149hit(149hit)